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TITLE OF THE INVENTION 
DIGITAL WATERMARK EMBEDDING METHOD, DIGITAL WATERMARK 

EXTRACTION METHOD, DIGITAL WATERMARK EMBEDDING 
APPARATUS, AND DIGITAL WATERMARK EXTRACTION APPARATUS 

5 

FIELD OF THE INVENTION 

The present invention relates to a technique for 
embedding a digital watermark in a document image, and 
also to a technique for extracting the embedded digital 
10 watermark. 



BACKGROUND OF THE INVENTION 

As a copyright protection method upon 
distributing digital data such as image data, audio 

15 data, and the like on the Internet, digital 

watermarking attracts a lot of attention. Digital 
watermarking is a technique for embedding information 
so as to be imperceptible to a human being. For 
example, as a digital watermarking technique for a 

20 multi-valued image, various methods that exploit the 

redundancy of the density values of mult i -valued pixels 
are known. 

On the other hand, a binary image such as a 
document image has small redundancy, and it is 
25 difficult to apply the digital watermarking technique 
to such image. However, some digital watermarking 
methods that exploit unique features of document images 
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are known. For example, a method of shifting the 
baseline of a line (e.g., see Japanese Patent 
No. 3,136,061), a method of manipulating an inter-word 
space length (e.g., see U.S. Patent No. 6,086,706 and 
5 Japanese Patent Laid-Open No. 9-186603 (U.S. Patent 
No. 5,861,619)), a method of manipulating an 
inter-character space length (e.g., see "Electronic 
document data hiding technique using inter- character 
space". The 1998 IEEE Asia-Pacific Conf. On Circuits 

10 and Systems, 1998, pp. 419 - 422), a method of rotating 
a character to change its inclination (e.g., see 
Yasuhiro Nakamura & Kineo Matsui, "Digital Watermarking 
onto Japanese Documents by Seal Image", IPSJ Journal 
Vol. 38, No. 11, Nov. 1997), and the like are known. 

15 However, since a document image has small 

redundancy, and the conventional methods proposed so 
far embed information by changing two variables, i.e., 
the baseline of a line, inter-word space, or rotation 
of a character, the changed points stand out (i.e., 

20 image quality deteriorates considerably). For this 
reason, embedding of information to a document image 
may be detected by a third party. 



SUMMARY OF THE INVENTION 
25 The present invention has been made in 

consideration of the aforementioned problems, and has 
as its object to provide a technique that can embed a 
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digital watermark data sequence in a document image 
while suppressing deterioration of the image quality. 

In order to achieve the above object, for example, 
an apparatus of the present invention comprises the 
5 following arrangement. 

That is, an apparatus for embedding a digital 
watermark in a document image, comprising: 

outer shape detection means for detecting outer 
shapes of characters in the document image, which 
10 include first and third outer shapes in a first line 

serving as a reference line, a second outer shape in a 
second line as a line other than the reference line, 
and a fourth outer shape in a third line as a line 
other than the reference line; and 
15 control means for controlling at least one of 

outer shapes included in respective pairs so as to set 
a parameter between the first and second outer shapes 
and a parameter between the third and fourth outer 
shapes to be different from each other in accordance 
20 with digital watermark information to be embedded. 

In order to achieve the above object, for example, 
a method of the present invention comprises the 
following arrangement . 

That is, a method for embedding a digital 
25 watermark in a document image, comprising: 

an outer shape detection step of detecting outer 
shapes of characters in the document image, which 
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include first and third outer shapes in a first line 
serving as a reference line, a second outer shape in a 
second line as a line other than the reference line, 
and a fourth outer shape in a third line as a line 
5 other than the reference line; and 

a control step of controlling at least one of 
outer shapes included in respective pairs so as to set 
a parameter between the first and second outer shapes 
and a parameter between the third and fourth outer 
10 shapes to be different from each other in accordance 
with digital watermark information to be embedded. 

In order to achieve the above object, for example, 
an apparatus of the present invention comprises the 
following arrangement . 
15 That is, an apparatus for embedding a digital 

watermark in a document image, comprising: 

outer shape detection means for detecting outer 
shapes of characters in the document image; 

reference calculation means for setting 
20 references at given intervals in a column direction; 
and 

control means for controlling at least one of 
second and third outer shapes of outer shapes in a line 
of interest, so as to set a parameter between the 
25 reference, which is located between a first outer shape 
and the second outer shape that neighbors the first 
outer shape, and the second outer shape, and a 
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parameter between the reference, which is located 
between the second outer shape and the third outer 
shape that neighbors the second outer shape, and the 
third outer shape, to be different from each other in 
5 accordance with digital watermark information to be 
embedded. 

In order to achieve the above object, for example, 
a method of the present invention comprises the 
following arrangement . 
10 That is, a method for embedding a digital 

watermark in a document image, comprising: 

an outer shape detection step of detecting outer 
shapes of characters in the document image; 

a reference calculation step of setting 
15 references at given intervals in a column direction; 
and 

a control step of controlling at least one of 
second and third outer shapes of outer shapes in a line 
of interest, so as to set a parameter between the 

20 reference, which is located between a first outer shape 
and the second outer shape that neighbors the first 
outer shape, and the second outer shape, and a 
parameter between the reference, which is located 
between the second outer shape and the third outer 

25 shape that neighbors the second outer shape, and the 
third outer shape, to be different from each other in 
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accordance with digital watermark information to be 
embedded. 

In order to achieve the above object, for example, 
an apparatus of the present invention comprises the 
5 following arrangement . 

That is, an apparatus for embedding a digital 
watermark in a document image, comprising: 

outer shape detection means for detecting outer 
shapes of characters in the document image; 
10 reference calculation means for setting 

references at given intervals in a column direction; 
and 

control means for controlling a second outer 
shape of outer shapes in a line of interest , so as to 
15 set a parameter between the reference, which is located 
between a first outer shape and the second outer shape 
that neighbors the first outer shape, and the second 
outer shape, to be different from each other in 
accordance with digital watermark information to be 
20 embedded. 

In order to achieve the above object, for example, 
a method of the present invention comprises the 
following arrangement. 

That is, a method for embedding a digital 
25 watermark in a document image, comprising: 

an outer shape detection step of detecting outer 
shapes of characters in the document image; 
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a reference calculation step of setting 
references at given intervals in a column direction; 
and 

a control step of controlling a second outer 
shape of outer shapes in a line of interest, so as to 
set a parameter between the reference, which is located 
between a first outer shape and the second outer shape 
that neighbors the first outer shape, and the second 
outer shape, to be different from each other in 
accordance with digital watermark information to be 
embedded . 

In order to achieve the above object, for example, 
an apparatus of the present invention comprises the 
following arrangement . 

That is, an apparatus for embedding a digital 
watermark in a document image, comprising: 

outer shape detection means for detecting outer 
shapes of characters in the document image; 

reference calculation means for setting 
references at given intervals in a column direction; 
and 

control means for controlling to set a parameter 
between a first reference position calculated by the 
reference calculation means, and a first circumscribing 
rectangle in a first line, and a parameter between a 
second reference position calculated by the reference 
calculation means and a second circumscribing rectangle 
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in a second line, to be different from each other in 
accordance with digital watermark information to be 
embedded . 

In order to achieve the above object, for example, 
5 a method of the present invention comprises the 
following arrangement. 

That is, a method for embedding a digital 
watermark in a document image, comprising: 

an outer shape detection step of detecting outer 
10 shapes of characters in the document image; 

a reference calculation step of setting 
references at given intervals in a column direction; 
and 

a control step of controlling to set a parameter 
15 between a first reference position calculated by the 

reference calculation means, and a first circumscribing 
rectangle in a first line, and a parameter between a 
second reference position calculated by the reference 
calculation means and a second circumscribing rectangle 
20 in a second line, to be different from each other in 
accordance with digital watermark information to be 
embedded . 

In order to achieve the above object, for example, 
an apparatus of the present invention comprises the 
25 following arrangement . 

That is, an apparatus for extracting data 
embedded in a document image, comprising: 
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setting means for setting a first line; 

outer shape detection means for detecting outer 
shapes of characters in the document image, which 
include first and third outer shapes in the first line 
5 serving as a reference line, a second outer shape in a 
second line as a line other than the reference line, 
and a fourth outer shape in a third line as a line 
other than the reference line; 

and 

10 extraction means for comparing a parameter 

between the first and second outer shapes and a 
parameter between the third and fourth outer shapes, 
and extracting data according to a comparison result of 
the parameters as the data embedded in the document 

1 5 image . 

In order to achieve the above object, for example, 
a method of the present invention comprises the 
following arrangement. 

That is, a method for extracting data embedded 
20 in a document image, comprising: 

a setting step of setting a first line; 
an outer shape detection step of detecting outer 
shapes of characters in the document image, which 
include first and third outer shapes in the first line 
25 serving as a reference line, a second outer shape in a 
second line as a line other than the reference line, 
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and a fourth outer shape in a third line as a line 
other than the reference line; 
and 

an extraction step of comparing a parameter 
5 between the first and second outer shapes and a 

parameter between the third and fourth outer shapes, 
and extracting data according to a comparison result of 
the parameters as the data embedded in the document 
image . 

10 In order to achieve the above object, for example, 

an apparatus of the present invention comprises the 

following arrangement . 

That is , an apparatus for extracting data 

embedded in a document image, comprising: 
15 outer shape detection means for detecting outer 

shapes of characters in the document image; 

reference calculation means for setting 

references at given intervals in a column direction; 

and 

20 extraction means for comparing a first parameter 

between the reference, which is located between a first 
outer shape and a second outer shape that neighbors the 
first outer shape, and the second outer shape, and a 
second parameter between the reference which is located 

25 between the second outer shape and a third outer shape 
that neighbors the second outer shape, and the third 
outer shape, of the outer shapes in a line of interest. 
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and extracting data corresponding to a comparison 
result of the parameters as data embedded using the 
first and second parameters. 

In order to achieve the above object, for example, 
5 a method of the present invention comprises the 
following arrangement. 

That is, a method for extracting data embedded 
in a document image, comprising: 

an outer shape detection step of detecting outer 
10 shapes of characters in the document image; 

a reference calculation step of setting 
references at given intervals in a column direction; 
and 

an extraction step of comparing a first parameter 
15 between the reference, which is located between a first 
outer shape and a second outer shape that neighbors the 
first outer shape, and the second outer shape, and a 
second parameter between the reference which is located 
between the second outer shape and a third outer shape 
20 that neighbors the second outer shape, and the third 

outer shape, of the outer shapes in a line of interest, 
and extracting data corresponding to a comparison 
result of the parameters as data embedded using the 
first and second parameters. 
25 In order to achieve the above object, for example, 

an apparatus of the present invention comprises the 
following arrangement . 
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That is , an apparatus for extracting data 
embedded in a document image, comprising: 

outer shape detection means for detecting outer 
shapes of characters in the document image; 
5 reference calculation means for setting 

references at given intervals in a column direction; 
and 

extraction means for comparing a first parameter 
between a first reference position calculated by the 

10 reference calculation means, and a first circumscribing 
rectangle in a first line, and a second parameter 
between the second reference position calculated by the 
reference calculation means, and a second 
circumscribing rectangle in a second line, and 

15 extracting data corresponding to a comparison result of 
the parameters as data embedded using the first and 
second parameters. 

In order to achieve the above object, for example, 
a method of the present invention comprises the 

20 following arrangement . 

That is, a method for extracting data embedded 
in a document image, comprising: 

an outer shape detection step of detecting outer 
shapes of characters in the document image; 

25 a reference calculation step of setting 

references at given intervals in a column direction; 
and 
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an extraction step of comparing a first parameter 
between a first reference position calculated in the 
reference calculation step, and a first circumscribing 
rectangle in a first line, and a second parameter 
5 between the second reference position calculated in the 
reference calculation step, and a second circumscribing 
rectangle in a second line, and extracting data 
corresponding to a comparison result of the parameters 
as data embedded using the first and second parameters. 
10 In order to achieve the above object, for example, 

an apparatus of the present invention comprises the 
following arrangement . 

That is, an apparatus for extracting data 
embedded in a document image, comprising: 
15 outer shape detection means for detecting outer 

shapes of characters in the document image; 

reference calculation means for setting 
references at given intervals in a column direction; 
and 

20 extraction means for comparing a parameter 

between the reference, which is located between a first 
outer shape and the second outer shape that neighbors 
the first outer shape, and the second outer shape, of 
the outer shapes in a line of interest, each other, 

25 and extracting data corresponding to a comparison 
result of the parameters as data embedded in the 
document image . 
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In order to achieve the above object, for example, 
a method of the present invention comprises the 
following arrangement. 

That is, a method for extracting data embedded 
5 in a document image, comprising: 

an outer shape detection step of detecting outer 
shapes of characters in the document image; 

a reference calculation step of setting 
references at given intervals in a column direction; 
10 and 

an extraction step of comparing a parameter 
between the reference, which is located between a first 
outer shape and the second outer shape that neighbors 
the first outer shape, and the second outer shape, of 

15 the outer shapes in a line of interest, each other, 
and extracting data corresponding to a comparison 
result of the parameters as data embedded in the 
document image . 

Other features and advantages of the present 

20 invention will be apparent from the following 

description taken in conjunction with the accompanying 
drawings, in which like reference characters designate 
the same or similar parts throughout the figures 
thereof . 

25 

BRIEF DESCRIPTION OF THE DRAWINGS 
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The accompanying drawings , which are incorporated 
in and constitute a part of the specification, 
illustrate embodiments of the invention and, together 
with the description, serve to explain the principles 
5 of the invention. 

Fig. 1 is a view for explaining the method of 
, embedding a digital watermark data sequence according 
to the first embodiment of the present invention; 

Fig. 2 is a view showing an example of formation 
10 of pairs; 

Fig. 3 is a block diagram showing the basic 
arrangement of a computer which serves as a digital 
watermark embedding apparatus, and a digital watermark 
extraction apparatus for extracting a digital watermark 
15 data sequence from a document image embedded with the 
digital watermark data sequence according to the third 
embodiment of the present invention; 

Fig. 4 is a flow chart of the process for 
embedding a digital watermark data sequence according 
20 to the first embodiment of the present invention; 

Fig. 5 is a flow chart of the process for 
extracting a digital watermark data sequence according 
to the first embodiment of the present invention; 
Fig. 6 is a view for explaining a digital 
25 watermark embedding method according to the second 
embodiment of the present invention; 
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Fig. 7 is a view for explaining a method of 
embedding more digital watermark data using 
circumscribing rectangles, which are not used in the 
digital watermark embedding method according to the 
5 second embodiment of the present invention; 

Fig. 8 is a view for explaining a digital 
watermark embedding method according to the third 
embodiment of the present invention; 

Fig. 9 is a view for explaining a method of 
10 embedding more digital watermark data using 

circumscribing rectangles, which are not used in the 
digital watermark embedding method according to the 
third embodiment of the present invention; 

Fig. 10 is a view for explaining a case wherein 
15 lines include different numbers of characters, i.e., 
circumscribing rectangles ; 

Fig. 11 is a view for explaining a digital 
watermark embedding method according to the fourth 
embodiment of the present invention; 
20 Fig. 12 is a view for explaining a digital 

watermark embedding method according to the fifth 
embodiment of the present invention; 

Fig. 13 is a flow chart of the digital watermark 
embedding process according to the fourth embodiment of 
25 the present invention; and 
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Fig. 14 is a flow chart of the digital watermark 
extraction process according to the fourth embodiment 
of the present invention. 

5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Preferred embodiments of the present invention 

will now be described in detail in accordance with the 

accompanying drawings . 

[First Embodiment] 
10 A method of embedding a digital watermark data 

sequence according to this embodiment will be described 

below using Fig. 1. Fig. 1 is a view for explaining 

the method of embedding a digital watermark data 

sequence according to this embodiment. 
15 Rectangles Al to A7 and Bl to B7 indicate 

circumscribing rectangles of characters in a document 

image. Circumscribing rectangles Al to A7 are those of 

characters of A-th line in the document image. 

Likewise, circumscribing rectangles Bl to B7 are those 
20 of characters of B-th line in the document image. 

These circumscribing rectangles are extracted using a 

document analysis technique. 

The circumscribing rectangle of each character is 

a rectangle that circumscribes a character, and 
25 information indicating a region which is to undergo 

character recognition. As a method of obtaining 

circumscribing rectangles of characters, the pixel 
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values of a document image are mapped on the vertical 
coordinate axis to segment the document image into 
lines by searching for blank portions (portions where 
no black characters are present ) , and determining lines 
5 (character strings that line up horizontally). After 
that, the document image is mapped on the horizontal 
coordinate axis for each line to search for blank 
portions, thus segmenting the line into characters. In 
this way, respective characters can be extracted 
10 (detected) as circumscribing rectangles. As this 

method, a method disclosed in, e.g., Japanese Patent 
Laid-Open No. 6-68301 (U.S. Patent No. 5,680,479) may 
be used. 

In the following description, an m-th 
15 circumscribing rectangle from the leftmost one in 
Fig. 1 in the n-th line from the uppermost one in 
Fig. 1 may be expressed as circumscribing rectangle n-m. 
In Fig. 1, reference numeral 101 denotes a distance 
between the right edges of circumscribing rectangles Al 
20 and B2; 102, a distance between the right edges of 
circumscribing rectangles A3 and B4; and 103, a 
distance between the right edges of circumscribing 
rectangles A5 and B6. As described above, the method 
of embedding a digital watermark data sequence 
25 according to this embodiment changes these distances in 
accordance with data to be embedded. 
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.The method of embedding the digital watermark 
data sequence will be described below. Fig. 3 shows 
the basic arrangement of a computer which serves as a 
digital watermark embedding apparatus, and also a 
5 digital watermark extraction apparatus for extracting a 
digital watermark data sequence from a document image 
embedded with the digital watermark data sequence 
according to this embodiment . Note that use of all 
blocks shown in Fig. 3 is not indispensable to 

10 implement the embedding method and an extraction method 
to be described later. 

Referring to Fig. 3, a computer 301 is a 
prevalent personal computer or workstation, and can 
receive, edit, and save an image scanned by a scanner 

15 317. Also, the computer 301 can print an image scanned 
by the scanner 317 on a print medium such as a paper 
sheet, OHP film, or the like using a printer 316. Note 
that various user's instructions can be input using a 
mouse 313 and keyboard 314. 

20 In the computer 301, respective blocks to be 

described below are connected via a bus 307 and can 
exchange various data. An MPU 302 controls the 
operations of respective blocks in the computer 301, 
and executes programs stored in a main memory 303, 

25 which comprises a RAM, so as to implement a series of 
processes associated with embedding of a digital 
watermark data sequence (to be described later) and a 
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series of processes for extracting a digital watermark 
data sequence embedded in a document image by this 
embedding process. 

The main memory 303 comprises an area for 
5 temporarily storing programs and data loaded from an 
HDD 304 , CD-ROM drive 309, DVD-ROM drive 310, FD drive 
311, and the like, and also a work area for temporarily 
storing data to be processed when the MPU 302 executes 
various processes . 

10 The hard disk drive (HDD) 304 can pre- store 

programs and document image data to be loaded onto the 
main memory 303, and can store processed document image 
data. An interface (I/F) 315 is connected to the 
scanner 317, which scans information recorded on a 

15 document, film, or the like, and generates image data, 
and is used to input image data scanned by the scanner 
317. An I/F 308 is connected to the printer 316 which 
prints image data, and transmits image data to be 
printed to the printer 316. 

20 The CD-ROM drive 309 can read out data stored in 

a CD-ROM (CD-R/CD-RW) as one of external storage media, 
and can write data on the CD-R/CD-RW. The FD (floppy® 
disk) drive 311 can read out data from an FD and can 
write data on the FD as in the CD-ROM drive 309. The 

25 DVD-ROM drive 310 can read out data from a DVD and can 
write data on the DVD as in the FD drive 311. When 
programs or printer drivers are stored in the CD-ROM, 
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FD, DVD-ROM, and the like, these programs are installed 
on the HDD 304, and are loaded onto the main memory 303 
as needed. 

An I/F 312 is connected to the mouse 313 and 
5 keyboard 314 to receive input instructions from them. 
A monitor 306 is a display device which can display an 
extraction process result of a digital watermark data 
sequence and its progress. Furthermore, a video 
controller 305 transmits display data to the monitor 
10 306. 

The digital watermark data sequence embedding 
process to be executed by the computer with the above 
arrangement (by the MPU 302 in practice) will be 
described below with reference to Fig. 4 which is the 
15 flow chart of that process. The progress of the 

following processes may be displayed on the monitor 306 
as needed. 

A document image in which a digital watermark is 
to be embedded is loaded onto the main memory 303 in 

20 response to a user's input instruction using the mouse 
313 or keyboard 314 (step S400). Assume that this 
document image is obtained by scanning a print medium 
such as a paper sheet or the like on which a document 
is printed, and converting the scan result into bitmap 

25 data. However, the method of obtaining a document 
image is not limited to such specific method. For 
example, document data created by a general document 
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editor or document data which is loaded from the CD-ROM 
drive 309, DVD-ROM drive 310, or FD drive 311 onto the 
main memory 303 may be converted into bitmap data to 
generate a document image. Also, the apparatus may 
5 comprise a network I/F that can connect to a network 
such as a LAN, Internet, or the like, and may 
externally receive and obtain a document image via the 
network. In any of the above cases, a document image 
is bitmap data. 

10 The document image as bitmap data undergoes the 

aforementioned document analysis to obtain 
circumscribing rectangles of characters (step S401). 
When the user inputs a digital watermark data sequence 
consisting of 1 or 0 using the keyboard 314 or mouse 

15 313, this data sequence is output to the main memory 
303 via the I/F 312, and is stored in the main memory 
303 (step S402) . 

The distance between the right edges of 
circumscribing rectangles in a pair (first pair) of 

20 rectangles n-m and (n+l)-(m+l) is calculated as dl . 

Taking Fig. 1 as an example, distance dl corresponds to, 
e.g., the distance 101 between the right edges of 
circumscribing rectangles Al and B2 . Also, the 
distance between the right edges of circumscribing 

25 rectangles in a pair (second pair) of rectangles 

n-(m+2) and (n+l)-(m+3) is calculated as d2 . Taking 
Fig. 1 as an example, distance d2 corresponds to, e.g.. 
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the distance 102 between the right edges of 
circumscribing rectangles A3 and B4. That is , these 
distances dl and d2 are calculated in step S403. 

If data to be embedded is 1, the flow advances to 
5 step S405 to execute one or a combination of the 

following two change processes so as to satisfy dl > d2 
(step S405) . 

•The size of circumscribing rectangle B2 in the 

column direction is increased or the size of 
10 circumscribing rectangle B4 in the column direction is 

decreased (a change in size). 

•The position of circumscribing rectangle B2 is 

moved toward the circumscribing rectangle B3 side or 

the position of circumscribing rectangle B4 is moved 
15 toward the circumscribing rectangle B3 side (a change 

in position) . 

An instruction for one or a combination of these 

two change processes to be executed may be determined 

in advance or may be input by the user. 
20 On the other hand, if data to be embedded is 0, 

the flow advances to step S406 to execute one or a 

combination of the following two change processes so as 

to satisfy dl < d2 (step S406). 

•The size of circumscribing rectangle B2 in the 
2 5 column direction is decreased or the size of 

circumscribing rectangle B4 in the column direction is 

increased (a change in size). 
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•The position of circumscribing rectangle B2 is 
moved toward the circumscribing rectangle Bl side or 
the position of circumscribing rectangle B4 is moved 
toward the circumscribing rectangle B5 side (a change 
5 in position) . 

An instruction for one or a combination of these 
two parameter change processes to be executed may be 
determined in advance or may be input by the user. 
Also, upon execution of the control process that 
10 changes the position and/or size of the circumscribing 
rectangle, the position and/or size of a character 
circumscribed by the circumscribing rectangle are/is 
similarly changed accordingly. 

Circumscribing rectangles to be changed in the 
15 above position change process and/or size change 

process are not limited to those described above, and 
one of dl > d2 and dl < d2 need only be met in 
correspondence with information to be embedded. 

The change process in step S405 or S406 is 
20 executed to obscure the change portion, i.e., to 
minimize deterioration of the image quality. 

Referring back to Fig. 4, if data to be embedded 
still remains, the flow returns to step S403 to repeat 
the above processes. For example, if data to be 
25 embedded still remains, the distance between the right 
edges of circumscribing rectangles in a pair (first 
pair) of circumscribing rectangles n-(m+4) and 
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(n+l)-(m+5) is calculated as dl , and the distance 
between the right edges of circumscribing rectangles in 
a pair (second pair) of circumscribing rectangles 
n-(m+6) and (n+l)-(m+7) is calculated as d2 , in step 
5 S403. Then, the processes in step S404 and subsequent 
steps are repeated. 

A method of extracting a digital watermark data 
sequence embedded by the aforementioned process will be 
described below. As described above, the process for 

10 extracting a digital watermark data sequence is also 

executed by the computer shown in Fig. 3. Fig. 5 is a 
flow chart showing the process to be executed by the 
computer (the MPU 302 in practice) to extract a digital 
watermark data sequence embedded by the aforementioned 

15 process. 

A document image embedded with a digital 
watermark data sequence (to be referred to as a 
watermarked image hereinafter) is loaded onto the main 
memory 303 in response to a user's input instruction 

20 using the mouse 313 or keyboard 314 (step S500). 
Assume that this watermarked image is obtained by 
scanning, using the scanner 317, a print medium such as 
a paper sheet, OHP film, or the like on which a 
watermarked image generated by the above embedding 

25 process is printed by the printer 317, and converting 
the scan result into bitmap data. However, the method 
of obtaining a watermarked image is not limited to such 
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specific method. For example, the watermarked image 
may be loaded from the HDD 304, CD-ROM drive 309, 
DVD-ROM drive 310, or FD drive 311 onto the main memory 
303- Also, the apparatus may comprise a network I/F 
5 that can connect to a network such as a LAN, Internet, 
or the like, and may externally receive and obtain the 
watermarked image via the network. 

The watermarked image undergoes the 
aforementioned document analysis to obtain 
10 circumscribing rectangles of characters (step S501). 

The process in this step is the same as the processing 
contents of step S401. 

Next, distance dl between circumscribing 
rectangles n-m and (n+l)-(m+l) and distance d2 between 
15 circumscribing rectangles n-(m+2) and (n+l)-(m+3) are 
calculated (step S502). If dl > d2 (step S503), the 
flow advances to step S504 to record embedded data as 1 
in the main memory 303 (step S504). On the other hand, 
if dl < d2, the flow advances to step S505 to record 
20 embedded data as 0 in the main memory 303 (step S505). 

It is then checked if circumscribing rectangles 
to be processed still remain (step S506). For example, 
if circumscribing rectangles to be processed still 
remain, distance dl between circumscribing rectangles 
25 n-(m+4) and (n+l)-(m+5) and distance d2 between 

circumscribing rectangles n-(m+6) and (n+l)-(m+7) are 
calculated in step S502 to repeat the processes in step 
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5503 and subsequent steps. If the number of embedded 
digital watermark data is known in advance, it may be 
determined whether or not those data have been recorded 
on the main memory 303. 

5 If it is determined in step S506 that no 

circumscribing rectangle to be processed remains, the 
data sequence recorded in the main memory 303 in steps 

5504 and S505 can be obtained as a digital watermark 
data sequence. With the above process, the data 

10 sequence can be extracted from a document image in 

which the digital watermark data sequence is embedded 
by the aforementioned method. 

In the aforementioned embedding method of a 
digital watermark into a document image, since the 

15 distance between circumscribing rectangles in different 
lines is changed in place of that between 
circumscribing rectangles in a single line, a portion 
to be changed can be distributed over the entire 
document image in place of changing the distance 

20 between circumscribing rectangles in a single line. 

Hence, a change in document image is hardly recognized 
by the human eye, and the image quality of the document 
image in which the digital watermark is embedded can be 
suppressed consequently. 

25 In this embodiment, when two circumscribing 

rectangles form one pair, the line positions and the 
positions of the circumscribing rectangles from the 
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leftmost rectangles in these lines are each different 
by one. However, the line positions of circumscribing 
rectangles may be spaced by two or more lines, and the 
positions of the circumscribing rectangles from the 
5 leftmost rectangles in these lines may be spaced by two 
or more rectangles . Also, respective pairs may have 
different positional relationships between 
circumscribing rectangles which belong to them. 

Fig. 2 shows an example of formation of pairs. 

10 In Fig. 2, Al and C3 , A2 and C4 , and A5 and C7 form 
pairs. Also, distances between circumscribing 
rectangles may be selected by different methods in 
respective pairs. For example, the distance between 
the right edge of one circumscribing rectangle and the 

15 left edge of the other circumscribing rectangle may be 
used, or either the distance between the right edges of 
the two circumscribing rectangles or the distance 
between the left edges of the two circumscribing 
rectangles may be used. When the method of selecting 

20 the distance is changed (e.g., for respective pairs) in 
this manner, the embedding method can become complex, 
and the secrecy of information to be embedded can be 
improved. Furthermore, combinations of lines may be 
complicated by selecting dl from the distances between 

25 circumscribing rectangles in lines A and C, and 
selecting d2 from those in lines A and B. 
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However, when a digital watermark data sequence 
embedded by the above process is extracted, information 
indicating the positional relationship between 
circumscribing rectangles that belong to each pair, and 
5 information indicating the method of selecting the 
distance are required for each pair (this embodiment 
requires only one each information since all pairs have 
the same positional relationship between circumscribing 
rectangle and the same method of selecting the 

10 distance). 

Also, circumscribing rectangles between which 
distances dl and d2 are to be calculated may be 
selected using a pseudo random number in accordance 
with digital watermark data to be embedded. Taking 

15 Fig. 1 as an example, when a pseudo random number is 
"0", the distance 101 is selected as dl , and the 
distance 102 is selected as d2; when a pseudo random 
number is "1", the distance 101 is selected as dl , and 
the distance 103 is selected as d2 ; and so forth. 

20 [Second Embodiment] 

In the first embodiment, two pairs of 
circumscribing rectangles, i.e., four circumscribing 
rectangles are required to embed 1-bit digital 
watermark data. This embodiment has as its object to 

25 reduce the number of circumscribing rectangles used to 
embed 1-bit digital watermark data, and to embed more 
digital watermark data than the digital watermark 
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embedding method according to the first embodiment 
using a limited number of circumscribing rectangles. 
Note that the digital watermark embedding method 
according to this embodiment is executed by the MPU 302 
5 in the apparatus with the arrangement shown in Fig. 3 
as in the first embodiment. In addition, the same 
techniques as those in the first embodiment are 
basically used unless otherwise specified. 

Fig. 6 is a view for explaining the digital 

10 watermark embedding method according to this embodiment. 
Referring to Fig. 6, rectangles Al to A7 indicate 
circumscribing rectangles which are arranged in a 
single line as in Fig. 1, and rectangles Bl to B7 also 
indicate circumscribing rectangles which are arranged 

15 in a single line as in Fig. 1. Reference numeral 601 
denotes a distance between the right edges of Al and 
B2; 602 , a distance between the right edges of B2 and 
A3; 603, a distance between the right edges of A3 and 
B4; and 604, a distance between the right edges of A4 

20 and B5 . 

The flow chart of the digital watermark embedding 
process according to this embodiment basically follows 
the flow shown in Fig. 4. Taking the circumscribing 
rectangles shown in Fig. 6 as an example, dl and d2 to 
25 be calculated in step S403 are respectively the 

distances 601 and 602. If data to be embedded is 1 , 
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one or a combination of the following two change 
processes is executed in step S405 to meet dl > d2 . 

•The size of circumscribing rectangle Al in the 
column direction is decreased or the size of 
circumscribing rectangle A3 in the column direction is 
decreased (a change in size). 

•The position of circumscribing rectangle B2 is 
moved toward the circumscribing rectangle B3 side or 
the position of circumscribing rectangle A3 is moved 
toward the circumscribing rectangle A2 side (a change 
in position) . 

On the other hand, if data to be embedded is 0, 
one or a combination of the following two change 
processes is executed in step S406 to meet dl < d2 . 

•The size of circumscribing rectangle Al in the 
column direction is increased or the size of 
circumscribing rectangle A3 in the column direction is 
increased (a change in size). 

•The position of circumscribing rectangle B2 is 
moved toward the circumscribing rectangle Bl side or 
the position of circumscribing rectangle A3 is moved 
toward the circumscribing rectangle A4 side (a change 
in position) . 

An instruction for one or a combination of these 
two change processes to be executed may be determined 
in advance or may be input by the user. Also, upon 
execution of the control process that changes the 
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position and/or size of the circumscribing rectangle, 
the position and/or size of a character circumscribed 
by the circumscribing rectangle are/ is similarly 
changed accordingly. 

Circumscribing rectangles to be changed in the 
above position change process and/or size change 
process are not limited to those described above, and 
one of dl > d2 and dl < 6.2 need only be met in 
correspondence with information to be embedded. In the 
above process, distance d2 is preferably changed 
without changing distance dl. 

If it is determined in step S407 that data to be 
embedded still remains, the flow returns to step S403 
to repeat the aforementioned process by selecting the 
distance 602 as dl and the distance 603 as d2 . In this 
case, the distance 603 is changed without changing the 
aforementioned relationship between the distances 601 
and 602. 

As described above, in the digital watermark 
embedding method according to this embodiment, the 
number of circumscribing rectangles required to embed 
1-bit data is three upon embedding the first 1 bit, and 
only one new circumscribing rectangle is used to embed 
each of subsequent bits. Except for the first 1 bit, 
1-bit data can be embedded using one circumscribing 
rectangle. Hence, when digital watermark data is 
embedded using a limited number of circumscribing 



- 32 - 



CFM03350/P203-0493 



rectangles, the digital watermark embedding method 
according to this embodiment can embed more data than 
the first embodiment. 

The method of extracting digital watermark data 
5 from a document image in which digital watermark data 
is embedded according to the aforementioned digital 
watermark embedding method is basically the same as the 
first embodiment except for the method of selecting 
distances dl and d2 (the method of selecting dl and d2 

10 in the aforementioned digital watermark embedding 

process). That is, the process according to the flow 
chart shown in Fig. 5 is executed. Also, the process 
for extracting a digital watermark data sequence is 
executed by the computer (MPU 302) shown in Fig. 3. 

15 Also, when circumscribing rectangles which are 

not used in the digital watermark embedding method 
according to this embodiment are further used, more 
digital watermark data can be embedded. Fig. 7 is a 
view for explaining this method. Rectangles Al to A7 

20 and Bl to B7 are the same as those shown in Fig. 6. In 
this embodiment, circumscribing rectangles Bl, A2, B3, 
A4, B5, and A6 are not used to embed digital watermark 
data. Hence, since the process of this embodiment is 
executed by selecting a distance 701 between the right 

25 edges of Bl and A2 as dl, and a distance 702 between 

the right edges of A2 and B3 as d2 , as shown in Fig. 7, 
digital watermark data can be embedded using 
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circumscribing rectangles which are not used in the 
digital watermark embedding method according to this 
embodiment, and more data can be embedded. 
[Third Embodiment] 
5 The digital watermark embedding method according 

to the second embodiment has a merit that it can embed 
more data than that of the first embodiment. However, 
since the changed positions (those to which distances 
dl and d2 are applied) are denser than the first 

10 embodiment, the image quality of a document image after 
embedding is more likely to deteriorate. 

To solve this problem, the digital watermark 
embedding method according to this embodiment embeds 
all data to be embedded using three circumscribing 

15 rectangles, but sets each consisting of three 

circumscribing rectangles are separated from each other. 
The digital watermark embedding method according to 
this embodiment will be described below using Fig. 8. 
Note that the digital watermark embedding method 

20 according to this embodiment is executed by the MPU 302 
in the apparatus with the arrangement shown in Fig. 3 
as in the first embodiment. In addition, the same 
techniques as those in the first embodiment are 
basically used unless otherwise specified. 

25 Fig. 8 is a view for explaining the digital 

watermark embedding method according to this embodiment. 
Referring to Fig. 8, rectangles Al to A7 indicate 
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circumscribing rectangles which are arranged in a 
single line as in Fig. 6, and rectangles Bl to B7 also 
indicate circumscribing rectangles which are arranged 
in a single line as in Fig. 6. Reference numeral 801 
5 denotes a distance between the right edges of Al and 
B2; 802, a distance between the right edges of B2 and 
A3; 803, a distance between the right edges of A4 and 
B5; and 804, a distance between the right edges of A5 
and B6 . The digital watermark embedding method 

10 according to this embodiment embeds each bit of digital 
watermark data using three circumscribing rectangles by 
the same method as that upon embedding the first 1 bit 
in the second embodiment , but the method of selecting 
three circumscribing rectangles is different from the 

15 second embodiment. That is, as shown in Fig. 8, sets 
each consisting three circumscribing rectangles (a set 
of Al, B2, A3 and a set of A4 , B5, and A6 in Fig. 8) 
are separated by one circumscribing rectangle. 

Then, digital watermark data is embedded by 

20 applying the same method as that upon embedding the 

first 1 bit in the second embodiment to the respective 
sets. At this time, circumscribing rectangles Al and 
A4 are not changed. In this way, since the changed 
portions are distributed, deterioration of the image 

25 quality of a document image after digital watermark 
data is embedded can be suppressed. 
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The method of extracting digital watermark data 
from a document image in which digital watermark data 
is embedded according to the aforementioned digital 
watermark embedding method is basically the same as the 
5 first embodiment except for the method of selecting 

distances dl and d2 (the method of selecting dl and d2 
in the aforementioned digital watermark embedding 
process). That is, the process according to the flow 
chart shown in Fig. 5 is executed. Also, the process 

10 for extracting a digital watermark data sequence is 
executed by the computer (MPU 302) shown in Fig. 3. 

Also, when circumscribing rectangles which are 
not used in the digital watermark embedding method 
according to this embodiment are further used, more 

15 digital watermark data can be embedded. Fig. 9 is a 
view for explaining this method. Rectangles Al to A7 
and Bl to B7 are the same as those shown in Fig. 8. In 
this embodiment, circumscribing rectangles Bl, A2 , B3, 
A4, B5, and A6 are not used to embed digital watermark 

20 data. Hence, since the process of this embodiment is 
executed by selecting a distance 901 between the right 
edges of Bl and A2 as dl, and a distance 902 between 
the right edges of A2 and B3 as d2 , as shown in Fig. 9, 
digital watermark data can be embedded using 

25 circumscribing rectangles which are not used in the 
digital watermark embedding method according to this 
embodiment, and more data can be embedded. 
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Note that the respective sets are spaced by one 
circumscribing rectangle in this embodiment. However, 
the present invention is not limited to such specific 
space, and that space may be changed in consideration 
5 of the number of circumscribing rectangles in the line 
direction of those included in a document image. 
[Fourth Embodiment] 

The first to third embodiments described above 
are implemented by comparing the distances between 

10 circumscribing rectangles in different lines. However, 
this method is not efficient when respective lines have 
different numbers of characters, i.e., circumscribing 
rectangles, as shown in Fig. 10. For example, upon 
embedding a digital watermark by combining the first 

15 and second lines , rectangles A5 to A7 , C6 , and C7 

cannot be used and wasted since they have no characters 
to be combined. Hence, the digital watermark embedding 
method according to this embodiment embeds a digital 
watermark while minimizing wasted circumscribing 

20 rectangles even when respective lines have different 

numbers of circumscribing rectangles, as exemplified in 
Fig. 10. Note that the digital watermark embedding 
method according to this embodiment is executed by the 
MPU 302 in the apparatus with the arrangement shown in 

25 Fig. 3 as in the first embodiment. In addition, the 
same techniques as those in the first embodiment are 
basically used unless otherwise specified. 
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Fig, 11 Is a view for explaining the digital 
watermark embedding method according to this embodiment. 
Circumscribing rectangles Al to A7, Bl to B4, and CI to 
C7 shown in Fig. 11 are the same as those shown in 
5 Fig. 10. Referring to Fig. 11, reference numeral 1101 
denotes a distance between the right edges of Al and 
B2; 1102, a distance between the right edges of A2 and 
B3; 1103 , a distance between the right edges of Al and 
C2; 1104, a distance between the right edges of A2 and 

10 C3; 1105, a distance between the right edges of A3 and 
C4; and 1106, a distance between the right edges of A4 
and C5. The digital watermark embedding method 
according to this embodiment will be described below 
taken Fig. 11 as an example. 

15 The flow chart of the digital watermark embedding 

process according to this embodiment basically follows 
the flow shown in Fig. 4, but which distances are to be 
calculated as dl and d2 in step S403 is different from 
the above embodiment. Fig. 13 is a flow chart of the 

20 digital watermark embedding process according to this 
embodiment . 

Since the processes in steps S1301 to S1303 are 
the same as those in steps S400 to S402, a description 
thereof will be omitted. In step S1304, a reference 
25 line is determined. Since this reference line is a 

line having the largest length, i.e., a line including 
the largest number of circumscribing rectangles, the 
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first line (a line including circumscribing rectangles 
Al to A7) is selected in this case. More specifically, 
circumscribing rectangles obtained in step S1302 are 
counted for respective lines , and a line with the 
5 largest count value is selected as the reference line. 
When a plurality of lines have the largest count value, 
a line closest to the first line is selected as the 
reference line. 

Furthermore, a target line is selected in step 

10 S1305. The target line is a line other than the 

reference line. In step S1305, one of lines other than 
the reference line, which is closest to the first line 
(second line in Fig. 11) is selected as the target line 
to be processed. 

15 In step S1306, the distances 1101 and 1102 are 

respectively calculated as distances dl and d2 . That 
is, the distances between the right edges of 
circumscribing rectangles in the reference line and 
those in the target line are calculated as dl and d2 . 

20 If data to be embedded is 1, the change process of the 
sizes and/or positions of circumscribing rectangles B2, 
B3, and the like is executed to satisfy dl > d2 ; if 
data to be embedded is 0, the change process is 
executed to satisfy dl < d2 . In this embodiment, the 

25 change process is not applied to the circumscribing 

rectangles in the reference line. Also, upon execution 
of the control process that changes the position and/or 
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size of the circumscribing rectangle, the position 
and/or size of a character circumscribed by the 
circumscribing rectangle are/is similarly changed 
accordingly. 

5 If it is determined in step S1310 that data to be 

embedded still remains, the flow returns to step S1305. 
In this case, it is checked in step S1305 if the target 
line includes unused circumscribing rectangles. In the 
example of Fig. 11, the circumscribing rectangles used 

10 in the target line, i.e., the line including 

circumscribing rectangles Bl to B4 are B2 and B3. 
Since Bl is not used as a rectangle to be processed, 
only B4 is an unused circumscribing rectangle in 
practice. In this embodiment, when two or more unused 

15 circumscribing rectangles remain, the target line 

remains unchanged. However, when the number of unused 
circumscribing rectangles is less than 2, the target 
line is changed. 

In the example of Fig. 11, since the number of 

20 unused circumscribing rectangles is one, the target 
line is shifted downward by one, and the third line, 
i.e., a line including circumscribing rectangles CI to 
C7, is selected as a new target line in step S1305. 
Hence, the distances 1103 and 1104 are respectively 

25 calculated as dl and d2 in step S1306. That is, the 
distances between the right edges of circumscribing 
rectangles in the reference line and those in the 
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target line are calculated as dl and 6.2. Then, the 
above processes are repeated for all lines after the 
second line. 

With the above processes, although digital 
5 watermark data cannot be embedded in the reference line, 
even when lines have different numbers of 
circumscribing rectangles, a larger number of 
circumscribing rectangles can be used compared to the 
above embodiment, thus efficiently embedding a digital 

10 watermark. 

The method of extracting digital watermark data 
from a document image in which digital watermark data 
is embedded according to the aforementioned digital 
watermark embedding method is basically the same as the 

15 first embodiment except for the method of selecting 

distances dl and d2 (the method of selecting dl and d2 
in the aforementioned digital watermark embedding 
process). Fig. 14 is a flow chart of the digital 
watermark extraction process according to this 

20 embodiment. Also, the process for extracting a digital 
watermark data sequence is executed by the computer 
(MPU 302) shown in Fig. 3. 

Since the processes in steps S1401 and S1402 are 
the same as those in steps S500 and S501, a description 

25 thereof will be omitted. Also, in steps S1403 to S1405, 
the reference line and target line are determined, and 
dl and d2 are calculated using circumscribing 
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rectangles in these lines as in steps S1304 to S1306. 
In step S1406 and subsequent steps, the same processes 
as those in step S503 and subsequent steps are executed. 
Furthermore, it is determined in step S1404 if 
5 the target line includes unused circumscribing 

rectangles (in this embodiment, if two or more unused 
circumscribing rectangles remain, the target line 
remains unchanged; if the number of unused 
circumscribing rectangles is less than 2, the target 

10 line is changed). With this process, data embedded by 
the digital watermark embedding process according to 
this embodiment can be extracted. 

The position of the reference line may be given 
as a key upon extracting a digital watermark. In this 

15 case, circumscribing rectangles need not be counted for 
respective lines in step S1403, and the reference line 
can be determined based on this key. 

In order to obtain distances dl and d2 in this 
embodiment, the distances between the right edges of 

20 circumscribing rectangles which are shifted one each in 
the column direction are calculated. However, the 
present invention is not limited to this, and the 
circumscribing rectangles may be shifted two or more 
each. 

25 In this embodiment, after the distances 1101 and 

1102 are calculated, the third line is selected as the 
target line. Alternatively, after the distances 1101 



- 42 - 



CFM03350/P203-0493 

and 1102 are calculated, the distance 1102 may be 
selected as dl , and the distance between the right 
edges of A3 and B4 may be calculated as d2 . Then, 
another digital watermark data may be embedded using 
5 these dl and d2 to embed more data. 
[Fifth Embodiment] 

In the fourth embodiment, digital watermark data 
cannot be embedded in the reference line, as described 
above. This embodiment allows to embed digital 

10 watermark data in all lines even when respective lines 
have different numbers of circumscribing rectangles, as 
exemplified in Fig. 11. Note that the digital 
watermark embedding method according to this embodiment 
is executed by the MPU 302 in the apparatus with the 

15 arrangement shown in Fig. 3 as in the first embodiment. 
In addition, the same techniques as those in the first 
embodiment are basically used unless otherwise 
specified. 

Fig. 12 is a view for explaining the digital 
20 watermark embedding process according to this 

embodiment. Referring to Fig. 12, circumscribing 
rectangles Al to A4 and Bl to B7 are arranged in 
respective lines. Also, Kl, K2 , K3, and K4 are 
references set at given intervals. Pitches between Kl 
25 and K2 , K2 and K3, and K3 and K4 will be respectively 
referred to as basic pitches in this embodiment. Note 
that this basic pitch is the average value of the 
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distances between the right edges of circumscribing 
rectangles in all lines, but may be obtained by other 
calculations . 

Also, in Fig. 12, reference numeral 1201 denotes 
5 a distance from Kl to the right edge of A2; 1202, a 
distance from K2 to the right edge of A3; 1203, a 
distance from K3 to the right edge of A4; 1204, a 
distance from Kl to the right edge of B2; 1205, a 
distance from K2 to the right edge of B3; 1206, a 

10 distance from K3 to the right edge of B4; and 1207, a 
distance from K4 to the right edge of B4 . The digital 
watermark embedding method according to this embodiment 
will be described below taking Fig. 12 as an example. 

The flow chart of the digital watermark embedding 

15 process according to this embodiment basically follows 
the flow shown in Fig. 4, but which distances are to be 
calculated as dl and d2 in step S403 is different from 
the above embodiment. In this embodiment, the average 
value of the distances between circumscribing 

20 rectangles in respective lines is calculated in step 

S403, and is stored in the main memory 303, HDD 304, or 
the like as the basic pitch. This basic pitch is also 
used as key information upon extracting a digital 
watermark. 

25 In step S403, the distances between the 

references (Kl, K2, K3, and K4 in Fig. 12), which are 
determined based on the basic pitch and are set between 
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neighboring circumscribing rectangles in the column 
direction in the first line, and the right edges of 
circumscribing rectangles, each of which appears 
immediately after the reference, are calculated. In 
5 the example of Fig. 12, the distances 1201 and 1202 are 
calculated as dl and d2 . 

If data to be embedded is 1 , the change process 
of the sizes and/or positions of circumscribing 
rectangles A2 and A3 is executed to satisfy dl > d2; if 

10 data to be embedded is 0, the change process is 

executed to satisfy dl < d2 . Also, upon execution of 
the control process that changes the position and/or 
size of the circumscribing rectangle, the position 
and/or size of a character circumscribed by the 

15 circumscribing rectangle are/is similarly changed 
accordingly. 

If it is determined in step S407 that data to be 
embedded still remains, the flow returns to step S403. 
In this case, it is checked in step S403 if the line to 

20 be processed includes unused circumscribing rectangles. 
In the example of Fig. 12, in the lines including 
circumscribing rectangles Al to A4 , circumscribing 
rectangles A2 and A3 are used. Since Al is not used as 
an object to be processed, only A4 is an unused 

25 circumscribing rectangle in practice. In this 

embodiment, when two or more unused circumscribing 
rectangles remain, only the line to be processed is 
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successively used; when the number of circumscribing 
rectangles is less than 2, the next line is also 
selected as a line to be processed. 

That is, in the example of Fig. 12, a line 
5 including Bl to B7 is also selected as a line to be 
processed to calculate the distances 1203 and 1204 as 
dl and d2, thus repeating the subsequent processes. 

With the above processes, even when lines have 
different numbers of circumscribing rectangles, digital 

10 watermark data can be embedded in all the lines. 

The method of extracting digital watermark data 
from a document image in which digital watermark data 
is embedded according to the aforementioned digital 
watermark embedding method is basically the same as the 

15 first embodiment except for the method of selecting 

distances dl and d2 (the method of selecting dl and d2 
in the aforementioned digital watermark embedding 
process). That is, the process according to the flow 
chart shown in Fig. 5 is executed. In step S502, the 

20 basic pitch may be calculated as in step S403, or the 
basic pitch calculated upon embedding may be loaded 
from the HDD 304 or the like as a key. Then, the 
distances between the references (Kl, K2, K3, and K4 in 
Fig. 12), which are determined based on the basic pitch 

25 and are set between neighboring circumscribing 

rectangles in the column direction, and the right edges 
of circumscribing rectangles, each of which appears 
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immediately after the reference, are calculated. In 
the example of Fig. 12, the distances 1201 and 1202 are 
calculated as dl and d2 . 

Furthermore, it is determined in step S502 if the 
5 line to be processed includes unused circumscribing 
rectangles (in this embodiment, if two or more unused 
circumscribing rectangles remain, the line to be 
processed is successively used; if the number of unused 
circumscribing rectangles is less than 2, the next line 
10 is also selected as the line to be processed) . With 
this process, data embedded by the digital watermark 
embedding process according to this embodiment can be 
extracted. 

In this embodiment, when the entire document 
15 image is enlarged or reduced in size, extraction of 
information may be disabled since the method of this 
embodiment uses comparison with a fixed value, i.e., 
the basic pitch, in place of relative comparison of 
distances unlike in the above embodiments. However, 
20 when an information sequence upon embedding is random, 
i.e., when 1 and 0 have equivalent probabilities of 
occurrence, since the average value upon embedding may 
equal that upon extraction, the average of the 
distances between the right edges of circumscribing 
25 rectangles upon embedding is expected to be nearly 
equal to that upon extraction. 
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Therefore, when the average value is used as the 
basic pitch, a process for calculating the distances 
between the right edges of circumscribing rectangles 
and then calculating their average may be executed in 
5 place of storing the basic pitch. Randomization of an 
information sequence can be easily realized by an 
encryption process of information to be embedded. In 
order to absorb offsets of the probabilities of 
occurrence of 1 and 0 in the information sequence to be 

10 embedded, several circumscribing rectangles at the end 
of a document or line may be used to correct such 
offset in place of using all circumscribing rectangles. 
That is, for example, when an information sequence to 
be embedded in one line includes "l"s 2 bits more than 

15 "0"s, the distances between circumscribing rectangles 
up to these T bits become larger than the average, 
but the distance between the subsequent two 
circumscribing rectangles can be set to be smaller than 
the average to correct the total length of the line. 

20 Note that no information is normally embedded in last 
several circumscribing rectangles. When the embedding 
and extraction sides share information indicating that 
correction information is embedded, the extraction side 
does not extract any information from last several 

25 circumscribing rectangles. 
[Sixth Embodiment] 
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In the fifth embodiment, two circumscribing 
rectangles are used to embed 1-bit data. The digital 
watermark embedding method according to this embodiment 
embeds 1-bit data using one circumscribing rectangle. 
5 Note that the digital watermark embedding method 

according to this embodiment is executed by the MPU 302 
in the apparatus with the arrangement shown in Fig. 3 
as in the first embodiment. In addition, the same 
techniques as those in the first embodiment are 

10 basically used unless otherwise specified. 

Taking Fig. 12 as an example, the positions 
and/or sizes of, e.g., A2 and A3 are changed to embed 
1-bit data in the fifth embodiment. That is, two 
circumscribing rectangles are used to embed 1-bit data. 

15 In this embodiment, the distance 1201 is calculated as 
dl, and the basic pitch as d2. If data to be embedded 
is 1, the process for changing the position or size of 
circumscribing rectangle A2 is executed to satisfy dl > 
d2; if data to be embedded is 0, that process is 

20 executed to satisfy dl < d2 . In this way, 1-bit data 
can be embedded using one circumscribing rectangle. 

The flow chart of the digital watermark embedding 
process according to this embodiment basically follows 
the flow shown in Fig. 4, but which distances are to be 

25 calculated as dl and d2 in step S403 is different from 
the above embodiment. In this embodiment, distance d2 
need not be calculated every process since it is a 
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fixed value. Since distance d2 is the basic pitch, it 
may be held in the main memory 303 or HDD 304 as a key, 
as described above. 

Also, the method of extracting digital watermark 
5 data from a document image in which digital watermark 
data is embedded according to the aforementioned 
digital watermark embedding method is basically the 
same as the first embodiment except for the method of 
selecting distances dl and d2 (the method of selecting 
10 dl and d2 in the aforementioned digital watermark 
embedding process). 

That is, the basic pitch is calculated as in the 
fifth embodiment, or the key is acquired to be set as 
distance d2 . Also, dl is changed for each data to be 
15 embedded like the distance 1201, distance 1202, , 
distance 1203,... taking Fig. 12 as an example. 

After that, the same processes as in the first 
embodiment are executed to extract data embedded by the 
digital watermark embedding process according to this 
20 embodiment. 

However, when the entire document image is 
enlarged or reduced in size, extraction of information 
may be disabled since this method also uses comparison 
with a fixed value, i.e., the basic pitch, in place of 
25 relative comparison of distances unlike in the above 
embodiments. However, such difficulty can be coped 
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with by making randomization like in the fifth 

embodiment . 

[Another Embodiment] 

The objects of the present invention are also 
achieved by supplying a storage medium (or recording 
medium), which records a program code of a software 
program that can implement the functions of the 
above-mentioned embodiments to the system or apparatus, 
and reading out and executing the program code stored 
in the storage medium by a computer (or a CPU or MPU) 
of the system or apparatus. In this case, the program 
code itself read out from the storage medium implements 
the functions of the above-mentioned embodiments, and 
the storage medium which stores the program code 
constitutes the present invention. The functions of 
the above-mentioned embodiments may be implemented not 
only by executing the readout program code by the 
computer but also by some or all of actual processing 
operations executed by an operating system (OS) running 
on the computer on the basis of an instruction of the 
program code. 

Furthermore, the functions of the above-mentioned 
embodiments may be implemented by some or all of actual 
processing operations executed by a CPU or the like 
arranged in a function extension card or a function 
extension unit , which is inserted in or connected to 
the computer, after the program code read out from the 
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storage medium is written in a memory of the extension 
card or unit. When the present invention is applied to 
the storage medium, that storage medium stores the 
program codes corresponding to the aforementioned flow 
5 charts . 

Also, the storage medium includes communication 
media such as communication cables used in networks 
such as the Internet, LAN, and the like. That is, when 
the program codes of the aforementioned embodiments are 

10 held in a server apparatus on a network, a program can 
be installed in a computer by downloading that program 
from the server apparatus to the computer via the 
network. Hence, the installed program is executed by a 
control circuit such as a CPU, MPU, or the like on the 

15 computer and, as a result, the computer can implement 
the functions of the aforementioned embodiments. 
Therefore, the aforementioned storage medium includes 
the communication media such as communication cables 
used in the networks . 

20 As many apparently widely different embodiments 

of the present invention can be made without departing 
from the spirit and scope thereof, it is to be 
understood that the invention is not limited to the 
specific embodiments thereof except as defined in the 

25 claims. 
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